Readme
Replication Information for
Can Party Elites Shape the Rank-and-File? Evidence from a Recruitment Campaign in India

This replication package contains the following items: 

1) R scripts that create the main analysis dataset: 

	A) data_merge.R merges a dataset on compliance and the number of pamphlets received by mid-level party members (compliance.csv) with the onboarding survey (onboarding_survey.csv) and the long-term retention survey (recall_survey.csv). In addition, this R script creates the dataset containing only survey responses for the baseline treatment. Outputs pamphlets_level.csv which is then used by var_create to create the main variables. 

	B) var_create.R which creates the variables for the analysis dataset and outputs aap_pamphlets_replication.csv.

2) R scripts that run the analysis for the main outcomes: 
The script `aap_analysis_late_replication.R` runs all analyses in the correct order. You can run this on your machine or on a high performance cluster (HPC). We recommend using an HPC, because of the time it takes to run the analysis on a personal computer.

	-You need to set the working directory and the path where your libraries are located. You can run .libPaths()[1] in R, and it should give you the correct path. Please copy and paste the library location where indicated in the R script. If your code does not work, please check your R libraries on your computer or on the HPC. It should be a path similar to "/home/ar8787/R/x86_64-redhat-linux-gnu-library/4.3". We use many different packages, the names and versions of which are included in this script. The script installs them in the correct version. If you are going to send `aap_analysis_late_replication.R` to a cluster to run, please ensure that your job has an internet connection. Packages need to be installed prior to running the replication script, following the instructions of your cluster. We highly recommend using a cluster since it takes too much time for a regular computer, even for a very high-performance chip like the M2 Max, more than 7 days.

	- aap_analysis_late.sbatch is an example of a shell file that runs `aap_analysis_late_replication.R` on the cluster. Before running this file, you need to set the working directory and library location in the `aap_analysis_late_replication.R` script. When running the .sbatch file on a computer cluster, please adjust the number of cores, the total memory, the modules to be loaded, and the total time to run. We recommend at least 48 hours.


	A) aap_analysis_late.R runs the analysis for Table 2, Table 4, Table D.1, Table E.2, Table F.6, Table G.9. It outputs `newmembers_coefficients.csv`, which is used by `aap_analysis_late_coef_plot.R`. 

	B) aap_analysis_late_excluded.R runs the analysis for the diversity of the recruit pool outcomes: Table 5, Table G.7, Table G.11. It outputs `excluded_coefficients.csv` and `excluded_fem_coefficients.csv`, which are used by `aap_analysis_late_coef_plot.R`.

	C) aap_analysis_late_skills.R runs the analysis for the skills of the recruit pool outcomes: Table G.8, Table G.12, Table G.13. It outputs `skills_coefficients.csv` and `skills_fem_coefficients.csv`, which are used by `aap_analysis_late_coef_plot.R`.

	D) aap_analysis_late_recall_survey.R runs the analysis for the longterm retention survey outcomes: Table G.14 - G.16. It outputs `recall_fem_coefficients.csv`, `recall_coefficients.csv`, `recall_excluded_fem_coefficients.csv`, `recall_excluded_coefficients.csv`, `recall_skills_fem_coefficients.csv`, `recall_skills_coefficients.csv`. All are used by `aap_analysis_late_coef_plot.R`. 

	E) aap_analysis_interaction.R runs the analysis for Table G.10, and generate the inputs for Figure 6, and Figure G.9. It outputs `newmembers_interaction_coefficients.csv`, `excluded_group_interaction_coefficients.csv`, `skill_index_interaction_coefficients.csv`. 

	F) aap_analysis_late_coef_plot.R uses output from A)-E) to create the coefficient plot figures: Figures 5-6 and Figure G.9,

	G) aap_analysis_simulation.R creates Figure E8,


3) R scripts that run supplementary analysis: 

	A) compliance_calc.R runs the analysis for Table 3, Table E.3, Figures E.5-E.7,

	B) DescriptiveStatsPlot.R creates Figure A.3

4) Datasets necessary for creating the main analysis dataset: 

	A) compliance.csv is a dataset that includes details about mid-level members' (vice presidents) treatment assignment and which kind of pamphlets they have received. 

	B) onboarding_survey.csv is a dataset of survey respondents that AAP was able to contact after the recruitment campaign. 

	C) recall_survey.csv is a dataset of respondents who were recontacted by AAP in the long-term retention survey.

5) Datasets necessary for the analysis: 

	A) aap_pamphlets_replication.csv is the main dataset. The unit of observation is the pamphlet. It includes information on the treatment status of the pamphlet and whether the pamphlet has received a callback from potential volunteers as well as the characteristics of potential volunteers. 

	B) compliance.csv is a dataset that includes details about mid-level members' (vice presidents) treatment assignment and which kind of	pamphlets they have received. 

	C) compliance_timeline.csv is a dataset that includes details about the timeline of pamphlet distribution for each mid-level party member.

	D) treatment_assignment.csv is a dataset of treatment assignment for each assembly constituency. Column names are assembly constituency names. Each row represents the treatment assignment for a mid-level party member. The numeric IDs of treatments in each row were used on sign-up sheets for mid-level party 	members to assign them to different treatment pamphlets.

	E) vpsurvey.csv includes information from a survey with mid-level party members. 

	F) census_2011.csv is a dataset of all villages in Jharkhand with demographic information drawn from the Census of India 2011 

	G) baseline_observation.csv is a dataset of responses from the ''missed call survey'' with potential volunteers. The survey only includes respondents who called a number associated with the baseline pamphlet.



	